变异推理(VI)的核心原理是将计算复杂后概率密度计算的统计推断问题转换为可拖动的优化问题。该属性使VI比几种基于采样的技术更快。但是,传统的VI算法无法扩展到大型数据集,并且无法轻易推断出越野数据点,而无需重新运行优化过程。该领域的最新发展,例如随机,黑框和摊销VI,已帮助解决了这些问题。如今,生成的建模任务广泛利用摊销VI来实现其效率和可扩展性,因为它利用参数化函数来学习近似的后验密度参数。在本文中,我们回顾了各种VI技术的数学基础,以构成理解摊销VI的基础。此外,我们还概述了最近解决摊销VI问题的趋势,例如摊销差距,泛化问题,不一致的表示学习和后验崩溃。最后,我们分析了改善VI优化的替代差异度量。
translated by 谷歌翻译
本文提出了一种名为定位变压器(LOTR)的新型变压器的面部地标定位网络。所提出的框架是一种直接坐标回归方法,利用变压器网络以更好地利用特征图中的空间信息。 LOTR模型由三个主要模块组成:1)将输入图像转换为特征图的视觉骨干板,2)改进Visual Backone的特征表示,以及3)直接预测的地标预测头部的变压器模块来自变压器的代表的地标坐标。给定裁剪和对齐的面部图像,所提出的LOTR可以训练结束到底,而无需任何后处理步骤。本文还介绍了光滑翼损失功能,它解决了机翼损耗的梯度不连续性,导致比L1,L2和机翼损耗等标准损耗功能更好地收敛。通过106点面部地标定位的第一个大挑战提供的JD地标数据集的实验结果表明了LOTR在排行榜上的现有方法和最近基于热爱的方法的优势。在WFLW DataSet上,所提出的Lotr框架与若干最先进的方法相比,展示了有希望的结果。此外,我们在使用我们提出的LOTRS面向对齐时,我们报告了最先进的面部识别性能的提高。
translated by 谷歌翻译
最近在体现AI中的研究已经通过使用模拟环境来开发和培训机器人学习方法。然而,使用模拟已经引起了只需要机器人模拟器可以模拟的任务:运动和物理接触的任务。我们呈现IGIBSON 2.0,一个开源仿真环境,通过三个关键创新支持模拟更多样化的家庭任务。首先,IGIBSON 2.0支持对象状态,包括温度,湿度水平,清洁度和切割和切片状态,以涵盖更广泛的任务。其次,IGIBSON 2.0实现了一组谓词逻辑函数,该逻辑函数将模拟器状态映射到烹饪或浸泡等逻辑状态。另外,给定逻辑状态,IGIBSON 2.0可以对满足它的有效物理状态进行示例。此功能可以以最少的努力从用户生成潜在的无限实例。采样机制允许我们的场景在语义有意义的位置中的小对象更密集地填充。第三,IGIBSON 2.0包括虚拟现实(VR)界面,以将人类浸入其场景以收集示威操作。因此,我们可以从这些新型任务中收集人类的示威活动,并使用它们进行模仿学习。我们评估了IGIBSON 2.0的新功能,以实现新的任务的机器人学习,希望能够展示这一新模拟器的潜力来支持体现AI的新研究。 IGIBSON 2.0及其新数据集可在http://svl.stanford.edu/igibson/上公开提供。
translated by 谷歌翻译
Multiple studies have focused on predicting the prospective popularity of an online document as a whole, without paying attention to the contributions of its individual parts. We introduce the task of proactively forecasting popularities of sentences within online news documents solely utilizing their natural language content. We model sentence-specific popularity forecasting as a sequence regression task. For training our models, we curate InfoPop, the first dataset containing popularity labels for over 1.7 million sentences from over 50,000 online news documents. To the best of our knowledge, this is the first dataset automatically created using streams of incoming search engine queries to generate sentence-level popularity annotations. We propose a novel transfer learning approach involving sentence salience prediction as an auxiliary task. Our proposed technique coupled with a BERT-based neural model exceeds nDCG values of 0.8 for proactive sentence-specific popularity forecasting. Notably, our study presents a non-trivial takeaway: though popularity and salience are different concepts, transfer learning from salience prediction enhances popularity forecasting. We release InfoPop and make our code publicly available: https://github.com/sayarghoshroy/InfoPopularity
translated by 谷歌翻译
Sarcasm is a form of irony that involves saying or writing something that is opposite or opposite to what one really means, often in a humorous or mocking way. It is often used to mock or mock someone or something, or to be humorous or amusing. Sarcasm is usually conveyed through tone of voice, facial expressions, or other forms of nonverbal communication, but it can also be indicated by the use of certain words or phrases that are typically associated with irony or humor. Sarcasm detection is difficult because it relies on context and non-verbal cues. It can also be culturally specific, subjective and ambiguous. In this work, we fine-tune the RoBERTa based sarcasm detection model presented in Abaskohi et al. [2022] to get to within 0.02 F1 of the state-of-the-art (Hercog et al. [2022]) on the iSarcasm dataset (Oprea and Magdy [2019]). This performance is achieved by augmenting iSarcasm with a pruned version of the Self Annotated Reddit Corpus (SARC) (Khodak et al. [2017]). Our pruned version is 100 times smaller than the subset of SARC used to train the state-of-the-art model.
translated by 谷歌翻译
Speech systems are sensitive to accent variations. This is especially challenging in the Indian context, with an abundance of languages but a dearth of linguistic studies characterising pronunciation variations. The growing number of L2 English speakers in India reinforces the need to study accents and L1-L2 interactions. We investigate the accents of Indian English (IE) speakers and report in detail our observations, both specific and common to all regions. In particular, we observe the phonemic variations and phonotactics occurring in the speakers' native languages and apply this to their English pronunciations. We demonstrate the influence of 18 Indian languages on IE by comparing the native language pronunciations with IE pronunciations obtained jointly from existing literature studies and phonetically annotated speech of 80 speakers. Consequently, we are able to validate the intuitions of Indian language influences on IE pronunciations by justifying pronunciation rules from the perspective of Indian language phonology. We obtain a comprehensive description in terms of universal and region-specific characteristics of IE, which facilitates accent conversion and adaptation of existing ASR and TTS systems to different Indian accents.
translated by 谷歌翻译
Heteroscedastic regression models a Gaussian variable's mean and variance as a function of covariates. Parametric methods that employ neural networks for these parameter maps can capture complex relationships in the data. Yet, optimizing network parameters via log likelihood gradients can yield suboptimal mean and uncalibrated variance estimates. Current solutions side-step this optimization problem with surrogate objectives or Bayesian treatments. Instead, we make two simple modifications to optimization. Notably, their combination produces a heteroscedastic model with mean estimates that are provably as accurate as those from its homoscedastic counterpart (i.e.~fitting the mean under squared error loss). For a wide variety of network and task complexities, we find that mean estimates from existing heteroscedastic solutions can be significantly less accurate than those from an equivalently expressive mean-only model. Our approach provably retains the accuracy of an equally flexible mean-only model while also offering best-in-class variance calibration. Lastly, we show how to leverage our method to recover the underlying heteroscedastic noise variance.
translated by 谷歌翻译
In molecular research, simulation \& design of molecules are key areas with significant implications for drug development, material science, and other fields. Current classical computational power falls inadequate to simulate any more than small molecules, let alone protein chains on hundreds of peptide. Therefore these experiment are done physically in wet-lab, but it takes a lot of time \& not possible to examine every molecule due to the size of the search area, tens of billions of dollars are spent every year in these research experiments. Molecule simulation \& design has lately advanced significantly by machine learning models, A fresh perspective on the issue of chemical synthesis is provided by deep generative models for graph-structured data. By optimising differentiable models that produce molecular graphs directly, it is feasible to avoid costly search techniques in the discrete and huge space of chemical structures. But these models also suffer from computational limitations when dimensions become huge and consume huge amount of resources. Quantum Generative machine learning in recent years have shown some empirical results promising significant advantages over classical counterparts.
translated by 谷歌翻译
Developing and least developed countries face the dire challenge of ensuring that each child in their country receives required doses of vaccination, adequate nutrition and proper medication. International agencies such as UNICEF, WHO and WFP, among other organizations, strive to find innovative solutions to determine which child has received the benefits and which have not. Biometric recognition systems have been sought out to help solve this problem. To that end, this report establishes a baseline accuracy of a commercial contactless palmprint recognition system that may be deployed for recognizing children in the age group of one to five years old. On a database of contactless palmprint images of one thousand unique palms from 500 children, we establish SOTA authentication accuracy of 90.85% @ FAR of 0.01%, rank-1 identification accuracy of 99.0% (closed set), and FPIR=0.01 @ FNIR=0.3 for open-set identification using PalmMobile SDK from Armatura.
translated by 谷歌翻译
Selective classification involves identifying the subset of test samples that a model can classify with high accuracy, and is important for applications such as automated medical diagnosis. We argue that this capability of identifying uncertain samples is valuable for training classifiers as well, with the aim of building more accurate classifiers. We unify these dual roles by training a single auxiliary meta-network to output an importance weight as a function of the instance. This measure is used at train time to reweight training data, and at test-time to rank test instances for selective classification. A second, key component of our proposal is the meta-objective of minimizing dropout variance (the variance of classifier output when subjected to random weight dropout) for training the metanetwork. We train the classifier together with its metanetwork using a nested objective of minimizing classifier loss on training data and meta-loss on a separate meta-training dataset. We outperform current state-of-the-art on selective classification by substantial margins--for instance, upto 1.9% AUC and 2% accuracy on a real-world diabetic retinopathy dataset. Finally, our meta-learning framework extends naturally to unsupervised domain adaptation, given our unsupervised variance minimization meta-objective. We show cumulative absolute gains of 3.4% / 3.3% accuracy and AUC over the other baselines in domain shift settings on the Retinopathy dataset using unsupervised domain adaptation.
translated by 谷歌翻译